AITopics | lead optimization

Collaborating Authors

lead optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lo-Hi: Practical ML Drug Discovery Benchmark

Neural Information Processing SystemsFeb-17-2026, 03:11:17 GMT

Finding new drugs is getting harder and harder. One of the hopes of drug discovery is to use machine learning models to predict molecular properties.

artificial intelligence, machine learning, molecule, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.98)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

cb82f1f97ad0ca1d92df852a44a3bd73-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 07:34:48 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.98)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Refine Drugs, Don't Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery

Kaech, Benno, Wyss, Luis, Borgwardt, Karsten, Grasso, Gianvito

arXiv.org Artificial IntelligenceOct-1-2025

We introduce InVirtuoGen, a discrete flow generative model for fragmented SMILES for de novo and fragment-constrained generation, and target-property/lead optimization of small molecules. The model learns to transform a uniform source over all possible tokens into the data distribution. Unlike masked models, its training loss accounts for predictions on all sequence positions at every denoising step, shifting the generation paradigm from completion to refinement, and decoupling the number of sampling steps from the sequence length. For \textit{de novo} generation, InVirtuoGen achieves a stronger quality-diversity pareto frontier than prior fragment-based models and competitive performance on fragment-constrained tasks. For property and lead optimization, we propose a hybrid scheme that combines a genetic algorithm with a Proximal Property Optimization fine-tuning strategy adapted to discrete flows. Our approach sets a new state-of-the-art on the Practical Molecular Optimization benchmark, measured by top-10 AUC across tasks, and yields higher docking scores in lead optimization than previous baselines. InVirtuoGen thus establishes a versatile generative foundation for drug discovery, from early hit finding to multi-objective lead optimization. We further contribute to open science by releasing pretrained checkpoints and code, making our results fully reproducible\footnote{https://github.com/invirtuolabs/InVirtuoGen_results}.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2509.26405

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization

Wang, Ziqing, Wen, Yibo, Pattie, William, Luo, Xiao, Wu, Weimin, Hu, Jerry Yao-Chieh, Pandey, Abhishek, Liu, Han, Ding, Kaize

arXiv.org Artificial IntelligenceSep-29-2025

Lead optimization in drug discovery requires efficiently navigating vast chemical space through iterative cycles to enhance molecular properties while preserving structural similarity to the original lead compound. Despite recent advances, traditional optimization methods struggle with sample efficiency-achieving good optimization performance with limited oracle evaluations. Large Language Models (LLMs) provide a promising approach through their in-context learning and instruction following capabilities, which align naturally with these iterative processes. However, existing LLM-based methods fail to leverage this strength, treating each optimization step independently. To address this, we present POLO (Preference-guided multi-turn Optimization for Lead Optimization), which enables LLMs to learn from complete optimization trajectories rather than isolated steps. At its core, POLO introduces Preference-Guided Policy Optimization (PGPO), a novel reinforcement learning algorithm that extracts learning signals at two complementary levels: trajectory-level optimization reinforces successful strategies, while turn-level preference learning provides dense comparative feedback by ranking intermediate molecules within each trajectory. Through this dual-level learning from intermediate evaluation, POLO achieves superior sample efficiency by fully exploiting each costly oracle call. Extensive experiments demonstrate that POLO achieves 84% average success rate on single-property tasks (2.3x better than baselines) and 50% on multi-property tasks using only 500 oracle evaluations, significantly advancing the state-of-the-art in sample-efficient molecular optimization.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2509.21737

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

Jin, Yaowei, Wang, Junjie, Xiang, Wenkai, Cao, Duanhua, Teng, Dan, Fan, Zhehuan, Xiong, Jiacheng, Sheng, Xia, Zeng, Chuanlong, An, Duo, Zheng, Mingyue, Zheng, Shuangjia, Shi, Qian

arXiv.org Artificial IntelligenceAug-1-2025

Bayesian Flow Networks (BFNs) have recently shown impressive performance across diverse chemical tasks, with their success often ascribed to the paradigm of modeling in a low-variance parameter space. However, the Bayesian inference-based strategy imposes limitations on designing more flexible distribution transformation pathways, making it challenging to adapt to diverse data distributions and varied task requirements. Furthermore, the potential for simpler, more efficient parameter-space-based models is unexplored. To address this, we propose a novel Parameter Interpolation Flow model (named PIF) with detailed theoretical foundation, training, and inference procedures. We then develop MolPIF for structure-based drug design, demonstrating its superior performance across diverse metrics compared to baselines. This work validates the effectiveness of parameter-space-based generative modeling paradigm for molecules and offers new perspectives for model design.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.13762

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

A 3D pocket-aware and affinity-guided diffusion model for lead optimization

Qiao, Anjie, Xie, Junjie, Huang, Weifeng, Zhang, Hao, Rao, Jiahua, Zheng, Shuangjia, Yang, Yuedong, Wang, Zhen, Li, Guo-Bo, Lei, Jinping

arXiv.org Artificial IntelligenceMay-1-2025

Molecular optimization, aimed at improving binding affinity or other molecular properties, is a crucial task in drug discovery that often relies on the expertise of medicinal chemists. Recently, deep learning-based 3D generative models showed promise in enhancing the efficiency of molecular optimization. However, these models often struggle to adequately consider binding affinities with protein targets during lead optimization. Herein, we propose a 3D pocket-aware and affinity-guided diffusion model, named Diffleop, to optimize molecules with enhanced binding affinity. The model explicitly incorporates the knowledge of protein-ligand binding affinity to guide the denoising sampling for molecule generation with high affinity. The comprehensive evaluations indicated that Diffleop outperforms baseline models across multiple metrics, especially in terms of binding affinity.

artificial intelligence, machine learning, molecule, (16 more...)

arXiv.org Artificial Intelligence

2504.21065

Country: Asia > China (0.47)

Genre: Research Report (0.65)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GenMol: A Drug Discovery Generalist with Discrete Diffusion

Lee, Seul, Kreis, Karsten, Veccham, Srimukh Prasad, Liu, Meng, Reidenbach, Danny, Peng, Yuxing, Paliwal, Saee, Nie, Weili, Vahdat, Arash

arXiv.org Artificial IntelligenceJan-10-2025

Drug discovery is a complex process that involves multiple scenarios and stages, such as fragment-constrained molecule generation, hit generation and lead optimization. However, existing molecular generative models can only tackle one or two of these scenarios and lack the flexibility to address various aspects of the drug discovery pipeline. In this paper, we present Generalist Molecular generative model (GenMol), a versatile framework that addresses these limitations by applying discrete diffusion to the Sequential Attachment-based Fragment Embedding (SAFE) molecular representation. GenMol generates SAFE sequences through non-autoregressive bidirectional parallel decoding, thereby allowing utilization of a molecular context that does not rely on the specific token ordering and enhanced computational efficiency. Moreover, under the discrete diffusion framework, we introduce fragment remasking, a strategy that optimizes molecules by replacing fragments with masked tokens and regenerating them, enabling effective exploration of chemical space. GenMol significantly outperforms the previous GPT-based model trained on SAFE representations in de novo generation and fragment-constrained generation, and achieves state-of-the-art performance in goal-directed hit generation and lead optimization. These experimental results demonstrate that GenMol can tackle a wide range of drug discovery tasks, providing a unified and versatile approach for molecular design.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.06158

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

Lin, Haitao, Zhao, Guojiang, Zhang, Odin, Huang, Yufei, Wu, Lirong, Liu, Zicheng, Li, Siyuan, Tan, Cheng, Gao, Zhifeng, Li, Stan Z.

arXiv.org Artificial IntelligenceJun-16-2024

Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair comparisons and inconclusive insights. To address this dilemma, we propose CBGBench, a comprehensive benchmark for SBDD, that unifies the task as a generative heterogeneous graph completion, analogous to fill-in-the-blank of the 3D complex binding graph. By categorizing existing methods based on their attributes, CBGBench facilitates a modular and extensible framework that implements various cutting-edge methods. Secondly, a single task on \textit{de novo} molecule generation can hardly reflect their capabilities. To broaden the scope, we have adapted these models to a range of tasks essential in drug design, which are considered sub-tasks within the graph fill-in-the-blank tasks. These tasks include the generative designation of \textit{de novo} molecules, linkers, fragments, scaffolds, and sidechains, all conditioned on the structures of protein pockets. Our evaluations are conducted with fairness, encompassing comprehensive perspectives on interaction, chemical properties, geometry authenticity, and substructure validity. We further provide the pre-trained versions of the state-of-the-art models and deep insights with analysis from empirical studies. The codebase for CBGBench is publicly accessible at \url{https://github.com/Edapinenut/CBGBench}.

atom, fragment, molecule, (15 more...)

arXiv.org Artificial Intelligence

2406.1084

Genre: Research Report > Promising Solution (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Deep Lead Optimization: Leveraging Generative AI for Structural Modification

Zhang, Odin, Lin, Haitao, Zhang, Hui, Zhao, Huifeng, Huang, Yufei, Huang, Yuansheng, Jiang, Dejun, Hsieh, Chang-yu, Pan, Peichen, Hou, Tingjun

arXiv.org Artificial IntelligenceApr-29-2024

The idea of using deep-learning-based molecular generation to accelerate discovery of drug candidates has attracted extraordinary attention, and many deep generative models have been developed for automated drug design, termed molecular generation. In general, molecular generation encompasses two main strategies: de novo design, which generates novel molecular structures from scratch, and lead optimization, which refines existing molecules into drug candidates. Among them, lead optimization plays an important role in real-world drug design. For example, it can enable the development of me-better drugs that are chemically distinct yet more effective than the original drugs. It can also facilitate fragment-based drug design, transforming virtual-screened small ligands with low affinity into first-in-class medicines. Despite its importance, automated lead optimization remains underexplored compared to the well-established de novo generative models, due to its reliance on complex biological and chemical knowledge. To bridge this gap, we conduct a systematic review of traditional computational methods for lead optimization, organizing these strategies into four principal sub-tasks with defined inputs and outputs. This review delves into the basic concepts, goals, conventional CADD techniques, and recent advancements in AIDD. Additionally, we introduce a unified perspective based on constrained subgraph generation to harmonize the methodologies of de novo design and lead optimization. Through this lens, de novo design can incorporate strategies from lead optimization to address the challenge of generating hard-to-synthesize molecules; inversely, lead optimization can benefit from the innovations in de novo design by approaching it as a task of generating molecules conditioned on certain substructures.

lead optimization, molecule, scaffold, (14 more...)

arXiv.org Artificial Intelligence

2404.1923

Country:

Oceania > Samoa (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

Lo-Hi: Practical ML Drug Discovery Benchmark

Steshin, Simon

arXiv.org Artificial IntelligenceOct-10-2023

Finding new drugs is getting harder and harder. One of the hopes of drug discovery is to use machine learning models to predict molecular properties. That is why models for molecular property prediction are being developed and tested on benchmarks such as MoleculeNet. However, existing benchmarks are unrealistic and are too different from applying the models in practice. We have created a new practical \emph{Lo-Hi} benchmark consisting of two tasks: Lead Optimization (Lo) and Hit Identification (Hi), corresponding to the real drug discovery process. For the Hi task, we designed a novel molecular splitting algorithm that solves the Balanced Vertex Minimum $k$-Cut problem. We tested state-of-the-art and classic ML models, revealing which works better under practical settings. We analyzed modern benchmarks and showed that they are unrealistic and overoptimistic. Review: https://openreview.net/forum?id=H2Yb28qGLV Lo-Hi benchmark: https://github.com/SteshinSS/lohi_neurips2023 Lo-Hi splitter library: https://github.com/SteshinSS/lohi_splitter

benchmark, dataset, molecule, (15 more...)

arXiv.org Artificial Intelligence

2310.06399

Country:

North America > United States > Washington > Benton County > Richland (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.98)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback